Echo Cancellation in Voip using block-based Adaptive Filter

 

M. Jyothirmai1, A. Mounika2, K. Prathima3, K. Navya Sree4

1Department of ECE, St. Peter’s Engineering College, Medchal, Hyderabad, Telangana, India

*Corresponding Author Email: jyothirmai@stpetershyd.com

 

ABSTRACT:

Acoustic Echo Cancellation (AEC) has ended up being a need in the present conferencing framework keeping in mind the end goal to improve the quality of audio of hands free communication. Over the most recent couple of years, many researchers and manufacturers have developed various AEC algorithms for telecommunication systems so as to improve their services. Over the most recent couple of years, many researchers and manufacturers have developed various AEC algorithms for telecommunication systems so as to improve their services. Many variables impact the parameters of an AEC framework, for instance computational unpredictability, memory utilization and so on. This paper proposes state-space model and the related technique for the divided square sifting structure which is particularly applicable in practice. This philosophy allows the utilization of extensively longer channel lengths and for the adaptable plan and execution of AEC devices for VoIP applications.

 

KEYWORDS: AEC, Adaptive Filtering, State Space Architecture.

 


INTRODUCTION:

Over the past few years, the telecommunication industry has witnessed a lot of changes. The usage of new services such as VoIP has become very common. Acoustic echo is the sound of the voice resonating in the phone receiver while talking. It begins in a nearby sound loop which happens when a microphone (s), get sound signs from a speaker (s), and sends it back to a beginning member [1]. Figure 1 demonstrates a general setup in which echo is created. The signal from the far-end x (n) is played by the close end amplifier. The signal from the microphone y (n) at that point picks up the reverberated d (n) together with the near end signal s(n), along with background noise and local speech. The received far-end flag is accessible as a source of perspective flag for the reverberate canceller, and the canceller utilizes it to produce a copy of the resound called (n) [2]. This imitation is subtracted from the close end motion in addition to the resound to yield the transmitted near end signal z(n) where z(n)=x(n)+d^(n)-(n). In a perfect world, the leftover echo (n)=r (n)-(n) will be very little after resound cancellation.

 

 

Fig 1 Acoustic echo path setup

A variety of adaptive filter structures and methods to control has been proposed for the adaption in adverse environment. Mostly, because of time-varying acoustics and echo-path under modeling, the AEC is not always able to necessarily eliminate echo and, thus, a residual echo suppressor (RES) is introduced to eliminate remaining echo components. In modern communication systems such as Voice-over-IP services or high-quality video conferencing systems, sampling frequencies of 16 kHz and higher are used. This implies a notable improvement in computational complexity in the AEC. Furthermore, the convergence speed of time-domain adaptive filter is usually not sufficient in case of high sampling rates and long echo paths. Besides sub-band adaptive filters, frequency-domain adaptive filters using block processing are familiar solutions to address both of these issues but the length needed for the frequency transform which in turn be quite large for echo paths which are long, causing potential algorithmic noise, e.g., while implementing the AEC using arithmetic fixed point in embedded devices [3]. In the above case, approaches based on partitioned-block filtering are more suitable, as they allow for flexible designs, e.g., a separate choice of the time span and transformation length covered by AEC and the filter length to the powers of two is not easily used in fast Fourier transform implementations. Additionally the reduction of the transformation length we also reduce the algorithmic delay caused by the RES of the post- processing stage. The delay can be avoided for the approximation of the RES filter using a partitioned block structure.

 

The step-size parameter to control the coefficients of adaption is generally a critical component of the AEC. The alternative approach of adaptive kalman filter in frequency domain relies on an acoustic echo path of state-space model to deduce a robust and efficient adaptive filter frequency-domain with extensive control of step-size. This method is correspondingly being applicable to multichannel and non-linear adaptive filtering problems. In this paper, the extension in adaptive kalman filter and the partitioned-block filtering structure, this has practical importance for widely varying acoustic conditions which is the application of AECs.

 

ADAPTIVEFILTERS:

As shown in the figure. 1, the adaptive filters will produce a model of an echo y (n) and the echo which is estimated is subtracted from the desired input signal d (n) generating the error signal which is estimated [4],

E (n)=d (n)-y (n)

 

The error signal which is estimated will be feedback to the adaptive filter so, it can self adjust the transfer function for achievement of optimum performance.

 

Least Mean Square (LMS) is a stochastic gradient based algorithm, it is mostly used algorithm in the adaptive filtering and famous for its ease in implementation. LMS algorithm is very sensitive to the spectral and power of input signal. This makes it hard to adjust the step size and guarantee the stability of the algorithm. The normalized convergence parameter is developed to resolve the problem by normalizing the step size with power of input signal, which causes the convergence rate independent from signal power.

 

Gradient-based LMS-algorithm (Widrow-Hoff) or a recursive least squares (RLS) is difficult to implement in full band. The dynamic characteristic of speech along with intervals of complete silence has turned to be a problem in the adaptive filtering. Additionally, the far from white spectral character reduces the speed of the adaption, this leads to large convergence time this makes the system sensitive to the changes for the acoustic room response. Ultimately the near-end speech and background noise if present also put demands on the system design [5].

 

State-Space Partitioned-Block Echo Path Model:

Using the representation depicted in Fig. 1, micro phone signal y (n) can be represented as sum of near end signal s (n) and an echo signal d (n). As an echo signal is created from the discrete-time. The convolution of the loud speaker input signal x (n) with an acoustic echo path w (n), we have

Y (n)=x(n)*w(n)+s(n)

 

Where w (n) corresponds to the filter coefficients.

In this we assume that an acoustic echo path is sufficiently modeled by a corresponding finite impulse response (FIR) filter. Aiming at the partitioned block implementation, we divide the FIR filter with coefficients w (n) into B partitions of length L. The coefficient vector wb (k) of the length L then it contains the coefficients of the b-th partition.

 

We further introduce an input vector xb (k) of the b-th partition of length M > L for the block time index k and a frame shift of R:

 

Then, the corresponding complex valued excitation matrix Xb (k) in the frequency domain is obtained as

 

where FM is the Fourier matrix of size M × M. Here, diag {a} represents a diagonal matrix with its main diagonal as the vector a.

 

The elements Xb (m,k) on the main diagonal of Xb (k) are given by

 

 

where m denotes the frequency index. The frequency domain representation Wb (k) of filter partitions is given by

 

 

where the constraint is applied that only the first L coefficients of the time-domain correspondence are non-zero:

 

Applying the overlap-and-save method for computing a block of microphone signal, we have

 

 

 

where QV denotes the windowing matrix and IV is the identity matrix of size V×V. The signal vectors y (k) and s(k) contain V=M?L+1 is the latest sample of microphone and the near end signal, respectively:

 

From the following V represents the number of valid samples of y (k). It was yielded from the fast convolution of the loud speaker signals and an echo path.

 

Note if V is greater than R it is more possible for specific choices of the DFT length M and the partition size L. At this point y (k) contains only R new samples whereas V?R other valid samples have already been computed in the earlier frame. The frequency domain part of the microphone signal is calculated when it is left multiplied by the Fourier matrix.

 

 

Where Cb (k)=FMQVFM-1Xb (k) has been introduced.

 

Here we define variables that are useful for the presentation the Kalman filter. Let us denote the adaptive filter partition coefficients by (k), whereas the frequency domain coefficient error vector of the b-th part is given as

 

(k)

 

And covariance matrix is given by

 

 

where {} denotes the expectation operator.

 

Regarding the last three equations the frequency-domain error E (k) at the output of echo canceler is given in terms of the coefficient of the error vector Wb,r (k), i.e.

 

 

 

Generally, the acoustic echo path, and, as a result, its coefficients Wb (k), are considered to be slowly time-varying. Let us take a basic stochastic Markov model for the dynamic behavior in the filter partitions Wb (k) according to

 

 

 

Exact Kalman filter solution:

By the derivation of the Kalman filter, the partitioned block the state-space architecture, we can use the derivation proposed to solve the issues of the multi-channel adaptive filtering using acoustic state-space modeling [6]. But it is important to notice that the partitioned block filter should be interpreted as a specific multiple input single output system and the channel (b-th partition) is given by Xb (k).

 

The equations describing the partitioned-block version of Kalman filter are then given for b-th partition as

 

 

with the so-called Kalman gain ofbth partition.

 

Residual Echo Suppression Part:

Practically the AEC is not always able to completely cancel the echo. Even when the length of the adaptive filter can sufficiently capture the room impulse response, residual echoes remain because of the time-varying nature of the acoustic echo path and the presence of observation noise in the signal [6].

 

In order to remove these residual echoes, a suppression filter H (m, k) is commonly applied to the output spectrum:

 

 

It is well-known that the Wiener solution for the residual echo suppression filter H (m, k) is given by

Where ՓSS (m,k) and ՓEE (m,k) denotes the power spectral densities (PSD) of the near end signals s (n) and the output e(n). The relation between the step-size parameters b (m,k) of the partitions is that the elements on the diagonal of b(k), and the optimum residual echo suppression filter. Obviously, the elements on the diagonal step-size matrix is given as

 

 

The following relations between the PSDs and the elements on the main diagonal of the corresponding covariance matrices approximately hold: [7] [8]

Substituting these conditions in the equation for the diagonal step size matrix we get

 

It is observed that the PSD of error signal is given in terms of the PSD of the coefficient error and the PSD of near-end signal:

 

Using these equations we obtain a simple relation between the step-size parameters of each partition and the optimum echo suppression filter:

 

 

SIMULATION RESULTS:

We present simulation results done on Xilinx software for the proposed algorithm for VoIP based on Kalman-AEC (K-AEC) and RES. The echo signal is simulated by convolving the clean speech measured impulse response with a reverberation time of about 300 ms. The signal-to-echo ratio during double-talk is about 0 dB and white noise has been added to the microphone to yield 25 dB near-end SNR.

 

Fig 2 (a) Microphone signal (b) Near-end speech signal

 

The sampling rate for VoIP is 16 kHz. The AEC was implemented with a partition-size same as the frame shift, i.e., L=R=256 samples. The DFT length is 512. The AEC uses B=[1, 2, 5, 10, 15] partitions, which correspond to modeled time spans of [16, 32, 80, 160, 240] ms, respectively. Adaptive filters with up to 3840 taps are used here, which is significantly more than in previous Kalman filters [6, 20, 21], where less than 1000 taps were used.

 

We observe that as the number of partitions increase, the achievable system distance is decreased. Two R-AEC variants are without a near-end voice activity detector (NE-VAD) and with an ideal NE-VAD. The ideal NE-VAD avoids divergence of the R-AEC during double-talk or near-end single-talk by freezing the adaption when the near-end speaker is active.

 

However, in practice the system distance of the R-AEC will be higher due to a non-ideal NE-VAD. The step-size control in the K-AEC prevents the adaptive filter from divergence during double-talk or near-end single-talk, but it still allows the AEC to converge sufficiently fast to an accurate solution. Its performance is almost equivalent to the R-AEC with an ideal NE-VAD.

 

DISCUSSION:

The table below gives a discussion if the existing algorithms.

AUTHOR

ALGORITHM

PURPOSE

ACCURACY

Vladimir[9]

Block Frequency Domain Adaptive Filtering Algorithm

It is used for optimal algorithm for cancelling acoustic and line echo.

76%

Sanjay k. Nagendra[10]

Least Mean Square Algorithm

It reduces the unwanted echo. Increases the communication quality.

75.2%

ConstatinPaleology[11]

Kalman Filter

It behaves like a variable step-size adaptive filter.

68%

G. Schmidt and E. Hansler[3]

Block Based Adaptive Filter algorithm

The delay can be reduced. It controls the complexity in computations.

79.2%

Previous work was done using BFDAF (Block Frequency Domain Adaptive Filtering), LMS (Least Mean Squares) algorithms and Kalman Filter. In BFDAF 76% of echo was reduced when the number of samples was 4096 [9]. LMS algorithm also reduced echo of 75.2% with the same number of samples. The drawback in LMS algorithm is more computational complexity because of 2N+1 multiplications and 2N additions [10]. Kalman Filter cannot be used for frequency domain and sub band versions combined with extensions to various channel and non-linear cases. Kalman filter reduced 68% of echo.[11].

By observing the results the proposed algorithm has given excellent signal to noise ratio of 79.2% and there was a considerable reduction in echo when 8000 samples were taken. When compared to BFDAF, LMS algorithms and Kalman Filter, Block Based Adaptive Filter algorithm has yielded better results.

 

 

Fig 3.System Distances for (a) K-AEC with B=[1,2,10,15] and (b) K-AEC and R-AEC (with and without NE-VAD), with B=15, obtained for an echo control senario

 

CONCLUSION:

The partitioned-block filter structure is widely used in implementations of acoustic echo controllers to maintain simultaneous constraints on delay, complexity in computations, memory requirements, and numerical stability. This paper adopted the larger framework of acoustic state-space modeling, which can comprehensively represent echo path variations and observation noise, to the partitioned-block structure. We found structural equivalence with multi-channel adaptive filter models known from stereophonic echo cancellation problems. We thus used the analogy to deduce, implement, and validate a complete echo cancellation and residual echo suppression system in state-space partitioned-block architecture.

 

REFERENCES:

1        S.L. Gay and J. Benesty, Eds., “Acoustic Signal Processing for Telecommunications”, Kluwer Academic Publishers, 2000.

2        J. Benesty, T. Gansler, D.R. Morgan, M.M. Sondhi, and S.L. Gay, “Advances in Network and Acoustic Echo Cancellation”, Springer, 2001.

3        G. Schmidt and E. Hansler, Acoustic echo and noise control: A practical approach, Hoboken: Wiley, 2004.

4        Gerald Enzer, Peter Vary, “Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones”, Applied Speech and Audio Processing, Elsevier, Vol 86, no 6, pp. 1140-1156, June 2006

5        Regine Le BouquinJeannes, Pascal Scalart, Gerard Faucon, and Christophe Beaugeant, “Combined Noise and Eco Reduction in Hands-Free Systems”: A Survey, IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp. 808-820, November 2001.

6        J. Wung, T.S. Wada, B.-H. Juang, B. Lee, T. Kalker, and R.W Schafer, “A system approach to residual echo suppression in robust hands-free teleconferencing,” in Proc. of Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2011, pp 445–448.

7        D.R. Morgan and J.C. Thi, “delay less sub band adaptive filter architecture,” IEEE Trans. Signal Process, vol. 43, no. 8, pp 1819–1830, August 1995.

8        W. Kellermann, “Analysis and design of multirate systems for cancellation of acoustical echoes,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), New York, April 1988, pp. 2570–2573.

9        Vladimir M. MatićandSrđan N. Abadžić “Acoustic and line echo cancellation using adaptive filters”15th Telecommunications forum TELFOR 2007,Serbia, Belgrade,

10      Sanjay K. Nagendra and Vinay Kumar S.B “Echo Cancellation in Audio Signal using LMS Algorithm” International Conference on Recent Trends in Engineering & Technology 2011.

11      ConstantinPaleologu, Jacob Benesty, and SilviuCiochină, “Study of the General Kalman Filter for Echo Cancellation” IEEE transactions on audio, speech, and language processing, vol. 21, No. 8, August 2013

 

Received on 23.02.2018            Accepted on 28.04.2018       

©A&V Publications all right reserved

Research J. Engineering and Tech. 2018;9(2): 195-200.

DOI: 10.5958/2321-581X.2018.00027.2